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ABSTRACT 

The question of whether tests can be both 
curriculum-neutral and effectiv » means of monitoring and motivating 
educational practice is discussed. Educational reform and testing are 
intimately linked, as tests are wJ.dely viewed as a means of 
educational improvement. Tests/assessments influence educator 
behavior by stimulating them to assure that their students perform 
well. Tests/assessments used for public accountability or program 
evaluation purposes affect the curriculum. A new vision of 
education — a thinking-oriented curriculum (TC) for all students — is 
considered, in which education focuses on higher-order abilities, 
problem solving and thinking, and the ability to go beyond the 
routine and exercise personal judgment, current tests that are 
inimical to a TC are discussed. To assess the extent to which 
decomposition and decontextualization — two key assumptions underlying 
standardized testing— permeate today's achievement tests, reading 
comprehension, language, and mathematics test batteries that are 
widely used in educational assessment are analyzed. Standardized 
tests fare badly when judged against the criterion of assessing and 
promoting a TC. They embody a view of education that defines 
knowledge and skill as a collection of bits of information and they 
demand fast non-reflective replies. Alternative performance 
assessments for a TC, including open-ended writing examinations 
(essays) and portfolio assessments, help release educators from the 
pressure toward fractionated low-lev* 1 forms of learning that are 
rewarded by most current tests, and they also set positive standards 
for an educational system that strives to cultivate thinking. Tied to 
curriculum and designed to be taught to, performance assessments can 
be essential tools for raising authentic educational achievement. A 
25-item list of references is included. (RLC) 
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Tests as Standards of Achievement 
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Lauren B. Resnick 
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and 

Daniel P. Resnick 
Carnegie Mellon 

In America, educational reform and testing are intimately linked. 
Test scores signal the need for reform, as evidenced by the attention paid 
to declining scores on college entrance exams and standardised tests, to 
Americans' weak ranking in international comparisons, and to the 
percentages of students performing; poorly on certain kinds of items in 
our national assessments. 

Tests are also widely viewed as instruments for educational improve- 
ment. Calls for better performance by American schools are almost 
always accompanied by increases in the amounts of testing done in the 
schools. New tests — and more active scrutiny of tests already in place 
— are frequently prescribed, both as a source of information for a 
concerned public and as a form of quality control and a r ; incentive for 
better performance by educators and students. 

At the same time, the rhetoric surrounding the introduction and 
interpretation of assessment programs often suggests that tests are not 
meant to influence curricula and teaching directly. This rhetoric of 
curriculum-neutral tests accords well with American traditions of local 
control over education, producing a profound and continuing resistance 
to any attempt v o impose a curriculum from outside a school district. If 
tests and assessments are considered curriculum-neutral — not geared 
to any particular instructional program and not imposing any particular 
set of goals or practices — they can be incorporated easily into the 
ideology of local educational control. If tests are itfcognized as guiding 
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or constraining the curriculum; they become problematic within our 
educational ideology. 

How are educators and the public at large to make sense of this 
discussion? Can tests be both curriculum-neutral and an effective means 
of monitoring and motivating educational practice? Are tests only indi- 
cators (see Fuhrman, 1988; Murnane & Kaizen, 1988) of how well schools 
are performing, without a direct influence on teaching? Or do they 
influence the c chool curriculum? Answers to these questions lie not in a 
theory of how tests should be used, but in a dispassionate analysis of the 
ways in which assessments function as elements in the social system of 
schooling. 

In assessing complex systems, we often aim for indirect indicators of 
desired properties rather than for direct measurements. For example, to 
obtain an indicator of the amount oi ambient heat in the air and thus 
how comfortable a room is for its occupants, we examine the height of 
mercury in a confined column and take a numerical temperature read- 
ing We do not really care about the height of the mercury, however 
What we normally care about is the physical comfort of people in the 
room. If we were to measure comfort directly, we would examine to 
what extent people were sweating, shivering, or showing other signs of 
physical discomfort, or ask them to rate their degree of comfort. One 
reason for using the temperature indicator is its unobtrusive character; 
taking a thermometer reading does not change the degree of comfort in 
the room. 

Discussions of educational testing and educational standard setting 
often use the language of indicators. Educational tests, however, do not 
share the unobtrusiveness of indicators used in other measurement 
systems. We cannot place a "test thermometer" in a classroom without 
expecting to change conditions in the room significantly. Because educa- 
tional tests are used in a social rather than a physical system, measure- 
ments that are made known to actors in the system can be expected to 
affect future actions. Molecules of air are not prompted to produce a 
particular temperature, but teachers and school principals can be moti- 
vated to produce test scores in an acceptable range. Any educational 
assessment that receives publicity will stimulate educators to assure 
that their students perform well on that assessment. 

This power of tests and assessments to influence educator behavior is 
precisely what makes them potent tools for improving educational 
standards Tests are introduced noi just to provide neutral indicators of 
the education system's performance, but also in the hope of upgrading 
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curriculum, teaching, and academic performance. There is considerable 
evidence that this strategy works, insofar that it produces a rise in test 
scores. Even in school districts with an official policy against teaching to 
tests, considerable attention in the press or elsewhere to test scores 
causes teachers to adapt their teaching to the tests. Often, the effects of 
this adaptation become visible only after a new test, with different 
emphases, is adopted by or imposed on the district. Test scores in grade 
equivalent or other comparative terms then drop. 

To account for this recurrent observation, several analysts have fo- 
cused attention on the extent to which test items and curriculum activi- 
ties correspond. When overlap between test and curriculum is high, test 
scores are high; when overlap decreases, so do test scores (Leinhardt & 
Seewald, 1981). School districts and teachers try to maximize overlap by 
choosing tests that match their curriculum. When they cannot control 
the tests, they strive for overlap by trying to match curriculum to the 
tests; i.e., by "curriculum alignment." In the first few years of a test's 
use, overlap increases as the curriculum is aligned. When a new tist is 
imposed, overlap suddenly decreases, because the curriculum cannot 
change as quickly as the test can. 

Some educators (e.g., Popham, 1987) have argued that the process of 
curriculum alignment is a favorable one and should be publicly encour- 
aged and supported with tools to make teaching to the tests easier and 
more reliable. Evidence from parts of the country in which measure- 
ment-driven instruction has been adopted indicates that such instruc- 
tion can, by focusing attention on a small set of desired objectives, 
improve performance on a particular set of test items. But this apparent 
success may actually mask stagnation or even decline in the kind of 
school achievement that is the real goal of educational reform today. For 
when the stakes are high — when school ratings and budgets or teacher 
salaries depend on test scores — efforts to improve performance on a 
particular assessment instrument seem to drive out most other educa- 
tional concerns. 

Shepard (1988) has studied this process in Texas, where teachers were 
given materials suggesting instructional strategies for each of the objec- 
tives on the state assessment. The strategies, which were specific to the 
test item forms, promised teachers who used them high test perform- 
ance for their students, because the curricula would be perfectly aligned 
with the tests. Commercially sold programs to help students learn test- 
taking skills, Shepard found, are also closely tied to specific'ilem types 
that appear on the major standardized test batteries. Under tnese condi- 
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tions, the range of skills taught is restricted, and slight variants in format 
that might be equally valid ways of exercising a skill are ignored, in 
favor of drilling students on the precise item types they will encounter 
on the tests. Other investigators (e.g., Cohen, 1987; Kellaghan, Madaus, 
6a Airasian, 1980; Romberg, Zarinnia, & Williams, 1989) have further 
documented the tendency of high-stakes tests to progressively restrict 
curricular attention to the objectives that are tested and to the particular 
item types that will appear on the tests. 

Whether we like it or not, what is taught and what is tested are 
intimately related. Public accountability systems will eventually influ- 
ence what is taught and how it is taught in the schools. We must think of 
every test or assessment used for public accountability or program 
evaluation purposes as an instrument that will affect the curriculum. 
For those who would use tests as a means of monitoring school achieve- 
ment, three principles may serve as guidelines. 

• You get what you assess. Educators will teach to tests if the tests 
matter in their own or their students' lives, making tests potential 
tools in educational reform. Tests must be carefully crafted to sample 
directly those educational performances that are valued. Indicators 
of desired goals, no matter how well they may correlate with the 
truly desired outcome, are not good public accountability measures. 
For example, multiple-choice tests can be designed to correlate very 
highly with written composition grades. Such tests are good indica- 
tors of composition skill, but if we put many of them into the testing 
system, we must expect children to practice answering multiple- 
choice questions. In contrast, if we put debates, discussions, essays, 
and problem solving into the testing system, children will spend time 
practicing those activities. 

• You do not get what you do not assess. What does not appear on 
tests tends to disappear from classrooms in time. If the goals of 
solving complex problems or writing extended essays are education- 
ally important, those activities need to be sampled directly in an 
assessment program aimed at encouraging improved instruction. It 
is not sufficient to test the basics (a common strategy in today's 
assessment programs) or to assume that preparing for the tests will 
take minimal time and that teachers can then go on to other higher- 
order abilities. 

• Build assessments toward which you want educators to teach. This 
q principle follows directly from the first two and lies at the heart of the 



matter. Assessments should be designed so that when teachers do the 
natural thing — that is, prepare their students to perform well — they 
exercise the kinds of abilities and develop the skills and knowledge 
that are the real goals of educational reform. This principle assumes 
that what is in the assessment will be practiced in the classroom in a 
similar form. It proposes the central question for any assessment 
exercise: "Is this what we want students to be doing with their 
educational time?" 

The Challenge of the Thinking Curriculum 

By placing curriculum at the heart of testing decisions, these prin- 
ciples assert that tests must be chosen to assess directly and, thereby, 
promote the goals considered most central and important in education, 
judged in these terms, most current tests are severely wanting. They are 
tuned to a curriculum of the past, one not suited to today's social and 
economic conditions. 

In the last several years, a new vision of education has emerged, 
fueled partly by the needs of a changing economy and partly by recent 
research on learning and cognition. According to this view, education 
must focus on higher-order abilities, on problem solving and thinking, 
on the ability to beyond the rouhne and to exercise personal judg- 
ment. Analyses of how technology is affecting the workplace and com- 
munication point to the need for workers at all levels to understand the 
technical systems they use, so they can participate in dispersed manage- 
ment systems requiring judgment and decision making (Resnick, 1987b; 
Scribner, 1984; Zuboff, 1988). Furthermore, working conditions are likely 
to change several times during an individual's work life, requiring a 
capacity for adaptive learning. Employers are finding that students now 
leaving high school are not prepared to function well in the work 
environments they enter. Like colleges, employers are calling on schools 
U) provide educational programs that enable graduates to reason and 
think, not just perform routine operations. 

A thinking-oriented curriculum for all constitutes a significant new 
educational agenda. Although it is not new to include thinking, problem 
solving, and reasoning in some students' school curriculum, it is new to 
include it in everyone's curriculum. It is new to aspire seriously to make 
thir king and problem solving regular aspects of the school program for 
the entire population, even minorities, non-Engiish speakers, and eco- 




nomicaily disadvantaged children. Developing educational programs 
that assume all individuals; not just the elite, can become competent 
thinkers is a new challenge. 

To meet this challenge, thinking must pervade the entire school 
curriculum for all students, from the earliest grades. One of the most 
important findings of recent research is that the kinds of mental proc- 
esses associated with thinking are not restricted to an advanced or 
higher-order stage of mental development (Resnick, 1987a; Resnick & 
Klopier, 1989). Instead, thinking and reasoning are intimately linked to 
successful learning of even elementary levels of reading, mathematics, 
and other school subjects. 

The traditional view — that the basics can be taught as routine skills, 
with thinking and reasoning to follow later — can no longer guide our 
educational practice. We know that one cannoi effectively memorize 
without organizing knowledge. Facts acquired without structure and 
rationale disappear quickly. Children cannot understand what they 
read without making inferences and using information that goes be- 
yond the written text. They cannot become good writers without engag- 
ing in complex planning and self-evaluation. It is not possible for them 
to learn basic math skills well if they only memorize rules for manipulat- 
ing written numerical symbols. Science learning requires students to 
build explanatory theories they can believe. All of this means that the 
skills we are accustomed to calling h\%her-lew\ are part of the most basic 
competencies. 

The thinking curriculum does not imply that instruction in processes 
of reasoning is a substitute for acquiring substantial knowledge. In- 
stead, recent research teaches us to be highly respectful of knowledge as 
a requirement for g<x>d thinking. People who know more about a topic 
reason more profoundly about it than people who know little about it. 

But the knowledge required for good thinking can only be acquired 
through processes of thinking. For concepts and organizing knowledge 
to be mastered, they must be used generatively — that is, they have to be 
called on over and over again, as ways to link, interpret, and explain 
new information. Education requires an intimate linking of thinking 
pnx'esses with knowledge content. This in turn calls for a reorganiza- 
tion of schooling, so that thinking suffuses the curriculum and is de- 
manded in every subject. 

10 
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Current Tests: Inimical to the Thinking Curriculum 



In light of the demands of the thinking curriculum, most current tests 
work against the reforms required in our educational system. Testing 
practice remains essentially unchanged from the era in which it was 
considered enough for schools to teach mastery of routine skills — 
doing simple computations, reading predictable texts, reciting civic or 
religious codes. Goals such as interpreting unf miliar texts, construct- 
ing convincing arguments, understanding complex systems, develop- 
ing approaches to problems, or negotiating problem resolution in a 
group were reserved for an elite. 

Two key assumptions, decomposability and decotttextualization, under- 
lie standardized testing technology and practice. These assumptions 
were compatible with the routinized skill goals of the past and with the 
psychological theories of the first part of this century. They are, how- 
ever, incompatible with thinking goals for education and with what we 
know today about the nature of human cognition and learning, 

Decomposability. Psychological theories of the 1920s assumed that 
thought could best be described as a collection of independent pieces of 
knowledge. That assumption can be clearly recognized in the work of 
psychologist Edward L. Thorndike, which profoundly influenced in- 
struction and testu a from the 1920s onward. In 1922, Thorndike 
published The Psychology of Arithmetic, in which he showed how the 
content of the elementary school arithmetic curriculum could be ana- 
lyzed as a collection of "bonds" between stimuli and responses. 
Thorndike proposed that the task of arithmetic instruction was to exer- 
cise the bonds that comprise arithmetic, rewarding correct responses 
and "stamping out" incorrect ones. Under this model, students who 
acquired all of the bonds could be said to know arithmetic completely. 
Students who acquired fewer bonds, or who learned them to a less 
reliable criterion of performance, could be said to have measurably less 
arithmetic knowledge. 

With this analysis of the nature of arithmetic knowledge and skill, 
constructing efficient, objective tests posed little problem. It was im- 
practical to test all possible bonds, but samples could easily be tested on 
any given occasion. If neither students nor teachers knew exactly which 
arithmetic facts or procedures would appear on a given test, they had to 
practice all of them, or all in a given subsection of a curriculum, in order 
to perform well. According to Thorndike, performance on a collection of 
specific wems constituted a valid indicator of how much of the whole 
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body of arithmetic a child knew. 

This kind of sampling of independent bits of knowledge and skills, 
now enhanced by much more sophisticated psychometric tools and 
theories, remains the basic strategy for standardized testing. But the 
decomposability assumption has been seriously challenged by recent 
cognitive research, which recognizes that complicated skills and compe- 
tencies owe their complexity not just to the number of components they 
engage, but also to interactions among the components and heuristics 
for calling upon them. 

Complex competencies, therefore, cannot be defined just by listing all 
their components. Information-processing theories of cognition (e.g., 
Anderson, 1983; Newell & Simon, 1972) analyze cognitive performances 
into complexes of rules, but performances critically depend on interac- 
tions among those rules. Each rule can be thought of as a component of 
the total skill, but the rules are not defined independently of one an- 
other. The competence of a problem-solving system thus depends on 
how the complex of rules acts together. Other cognitive theories, which 
stress the role of structured knowledge and organizing principles in 
learning and thinking, move even further from the decomposition as- 
sumption. 

All of this suggests that efforts to assess thinking and problem- 
solving abilities by identifying separate components of those abilities 
and testing them independently interferes with effectively teaching 
such abilities. Assessing separate components encourages exercises in 
which isolated components are practiced. But since the components do 
not add up to thinking and problem solving, students who practice only 
the components are uniikely to learn real problem solving or interpre- 
tive thinking. 

Decontextualization. The second major assumption built into stan- 
dardized tests asserts that earh component of a complex skill is a fixed 
entity that will take the same form wherever it is used. If students know 
how to distinguish a fact from an opinion, for example, they can do so 
under all conditions of argument and debate, in all knowledge contexts. 
Under this assumption, it makes sense to select key critical thinking 
skills for decontextualized practice in school. 

But the assumption no longer appears valid. Recent developments in 
the epistemology and philosophy of science (e.g., Lakatos, 1978; Toul- 
min, 1972) show that there is no absolute line between fact and theory, 
data and interpretation. Instead, what is counted as fact depends on 
tools and instruments with built-in theories, and on communally ac- 
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cepted methods for deciding among competing assertions. Thus, his- 
tory and literature, as well as science and mathematics, must be under- 
stood as interpretive domains in which knowledge and skill cannot be 
detached from their contexts of practice and use. Educationally, this 
suggests that we cannot teach a skill component in one setting and 
expect it to be applied automatically in another. We cannot validly 
assess a competence in a context very different from that in which it is 
practiced and used. In writing, for example, decontextualized editing 
exercises, a common element in standardized test batteries, do not 
reveal what people do when they edit their own work. If we are trying to 
educate people who can craft phrases and sentences to convey intended 
meanings, editing tests set a false direction. Such decontextualization 
does violence to the kinds of abilities we seek. 

To gauge the extent to which the decomposition and decontextuali- 
zation assumptions permeate today's achievement tests, we examined 
t^e standardized test batteries widely used in educational assessment 
by individual school districts and in state assessments of educational 
quality as part of mandated testing programs. 

Reading comprehension. Reading comprehension tests generally 
present short passages (an average of 250-350 words in the grades 8-11 
testo we analyzed), together with multiple short questions. In asking for 
bits of information rather than interpretation of an extended passage , 
these tests reflect the decomposability assumption, treating knowledge? 
and skill as accumulations of isolated pieces of information and not as 
coherent, interactive systems. Furthermore, the tests encourage quick 
finding of ans, rather than reflective interpretation. The tests we 
examined allow students an average of five to six minutes to read a 
series of rief passages and answer five to eight questions about each. 
Although the tests require a degree of textual interpretation, their iso- 
lated questions rarely examine how students interrelate parts of the text 
and do not require justifications that support the interpretations. The 
nature of the questions and the speed with which ihey must be an- 
swered do not inviie the kind of reflection and elaboration demanded 
by the thinking curriculum. 

These tests tacitly convey a definition of reading as perusing short 
passages to answer other people's questions. Furthermore, the test 
format suggests that the answers to these questions are already known 
by the person asking them. Under these conditions, reading comprehen- 
sion appears to be a matter of finding predetermined answers, not 
interpreting the written word. Childien who practice reading mainly in 
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the form in which it appears on the tests have little exposure to the 
demands and reasoning possibilities of the thinking curriculum. 

Language. The other standardized subtests devoted to language 
engage students in even less contextualized and extended thinking than 
the comprehension tests do. Vocabulary tests present decontextualized 
words in questions that must be answered at a rate of two or three per 
minute if the whole test is to be completed. Spelling tests usually contain 
items in which the student selects a proper spelling from among a set of 
misspellings — again at a rate of two or three per minute. There are 
various subiests on language usage, mechanics, expression, and punc- 
tuation. The items involve recognizing errors and choosing (not produc- 
ing) corrections, usually at the rate of two or three items per minute. 
Students who practiced exercises similar to those that fill the standard- 
ized language tests would not learn to write coordinated, coherent 
prose. They might not even learn to write locally correct prose or to use 
a wide range of vocabulary, for there is good evidence that recognizing 
other people's errors and choosing the correct alternatives are not the 
same processes as those needed to produce good written language. 
These tests carry the decontextualization assumption to the extreme. 

Mathematics. On the whole, the mathematics portions of the stan- 
dardized tests fare even less well than the reading portions on the 
-criteria laid out in this paper All the tests contain major sections in 
which arithmetic computations are to be performed at the rate of one or 
two problems per minute. These are, perhaps, reasonable assessments 
of computational fluency; in any case, they do not claim to assess 
mathematical reasoning. 

Much more disturbing are the subtests aimed at assessing mathe- 
matical concepts and problem solving. These, too, consist of many short, 
unrelated items, usually presenting problems to be solved at the rate of 
about one per minute. Recent publications of the National Council of 
Teachers of Mathematics (1989) and the Mathematical Sciences Educa- 
tion Board (National Research Council, 1989) establish standards for a 
conceptually oriented thinking curriculum in mathematics and call for 
extended mathematical reasoning, including problems that can be at- 
tacked by several different methods. None of the standardized mathe- 
matics tests that we examined even approximates these standards. 

14 
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In summary, the standardized tests 1 fare badly when judged against 
the criterion of assessing and promoting a thinking curriculum. They 
embody a definition of knowledge and skill as a collection of bits of 
information, and they demand fast, nonreflective replies. The tests and 
the classroom practices that might be used to prepare for them suggest 
to students a view of knowledge counter to what the thinking curricu- 
lum seeks to cultivate: If you do not know an answer immediately, there 
is no way of arriving at a sensible response by thought and elaboration. 
Although some reading comprehension items demand interpretation of 
and inference from the text, questions are usually presented as isolated 
and unconnected with each other, with no hint that interpreting a text 
might involve an extended line of reasoning. The multiple-choice for- 
mat, furthermore, reinforces the idea that someone else already knows 
the answer to the question, so original interpretations are not expected; 
the task is to find or guess the right answer, rather than to engage in 
interpretive activity. 

Alternative Assessments for the Thinking Curriculum 

Although the tests most widely used to assess achievement are un- 
friendly to the goals of the thinking curriculum, it is possible to develop 
assessments that will actually enhance thinking and reasoning abilities 
when teachers gear their instruction to the tests, Several states have 
recently added to their assessment batteries a writing examination, in 
which students produce essays that are graded by panels of judges to 
yield quantitative scores. A similar writing assessment is now included 
in the National Assessment of Educational Progress (NAEP). 

These writing tasks begin to meet the criteria laid out in this essay for 
educationally appropriate assessment. If students engaged regularly in 
the type of activities found in the assessment, they would be practicing 
writing in an authentic form. It is possible to teach to these tests without 
destroying their educational validity They represent potentially power- 
ful tools of educational improvement, because their presence in the 
assessment system will actively encourage educators to provide signifi- 
cant amounts of writing practice in the curriculum. 



1 Although our detailed analysis was limited to widely used, commercially developed 
tests, most state-developed tests, as well as NAEP, use similar kinds of items and are 
subject to the same general critique. j \j 



73 



The adoption of open-ended writing assessments by several states 
and by NAEP marks an important change in assessment policy. Na- 
tional and state testing agencies are now recognizing that open-ended 
responses can be scored with sufficient reliability to provide data to the 
public and to the educational system on the quality of learning. The use 
of writing assessments has shown the feasibility of using complex, 
integrated performances, rather than series of isolated questions, in 
public accountability systems. Their use has also shown that it is pos- 
sible to derive reliable, credible quantitative measures from judgments 
of these products rather than from precoded correct answers. The suc- 
cessful use of writing assessments as part of public accountability test- 
ing opens the way for a much wider variety of new assessment meth- 
ods, methods that are more compatible with the nation's aspiration to 
education for thinking. 

The essay assessments are an example of a broader category of 
assessments, often called performance assessment?. A performance assess- 
ment uses direct judgments and evaluations of performances, rather 
than indirect indicators of competence. Performance assessments are 
widely used in the arts a..d athletics. At the Olympics, for example, 
performances with no direct competition (such as diving and gymnas- 
tics) are rated by judges, and the pooled ratings are used to decide who 
wins medals. In music competitions, pianists or violinists perform a 
prescribed or self-selected repertoire; these performances are rated by 
judges, and again pooled ratings determine the winners. Ratings are 
often made on several separate dimensions, as well as on overall, global 
performance, and there may be complex formulas for weighting the 
different judgments. 

A variant of the performance assessment is the portfolio as^ssment 
(see Gardner, in press). This method, frequently used in the visual and 
plastic arts and other design fields, requires individuals to collect their 
work over a period of time, select a sample of the collection that best 
represents their capabilities, and submit this portfolio to a jury or panel 
of judges. 

Although best developed in the arts and athletics, performance and 
portfolio assessments are adaptable io other domains of knowledge and 
skill. The simplest form of performance assessment is the written essay, 
which can be used not only to assess writing skill, but also to assess 
knowledge of issues and ideas within a subject. Special forms of essay 
examinations also yield evidence of students' ability to carry out inves- 
tigations and analyses of data. For example, the Advanced Placement 
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Test in History contains a document-based question, in which students 
must analyze a set of documentary sources to answer an interpretative 
question. In England, where open-ended essay examinations have al- 
ways been part of the graduation examinations taken by students at 16 
and 18 years of age, there have been recent experiments with the use of 
extended project reports as part of the formal assessment system. The 
Manchester Joint Matriculation Board's Engineering Science Examina- 
tion, for example, includes both experimental investigations and ex- 
tended applied projects as p?rt of the assessment portfolio (Joint Ma- 
triculation Board, 1982). Candidates conduct experiments or investiga- 
tive projects on topics such as measuring strain in a model suspension 
bridge, estimating the volume of water flowing in a rive,, and designing 
and building a device for evaluating sound insulating properties of 
common building materials over the course of several months. They 
then submit reports on their plans, execution, outcome, and interpreta- 
tion to the examining board. These reports are rated on each of several 
criteria (e.g., theoretical understanding, planning, design, use of proce- 
dures and equipment, possible alternative solutions considered, quality 
of the written report), and ratings are averaged to yield an overaU grade 
for each candidate. 

The Advanced Placement Tests and the Engineering Science Exami- 
nation just described are equivalent to first-year college course exami- 
nations and are intended for only a fraction of the secondary school 
population. Performance assessments are, however, equally suitable for 
younger and broader populations of students. In this country, the Na- 
tional Assessment of Educational Progress has studied the feasibility of 
using open-ended exercises to assess higher-order thinking in science 
and mathematics at grader, 3, 7, and 11 (Blumberg, Epstein, MacDonald, 
&r. Mullis, 1986; National Assessment of Educational Progress, 1987). 
These assessment exercises included written responses to problems; 
"station activities," in which individual students used equipment to 
investigate a phenomenon and then answered open-ended questions 
about it; and complete experiments that students designed, carried out, 
and reported orally. Some of the exercises were graded on the basis of 
students' written answers — their products. Others required observers to 
rate the processes students revealed as they worked. In both cases, 
graders had to be trained to apply common criteria and standards. 

Videotaping of performances, a technology now inexpensive and 
reliable enough for widespread use, could in the future substantially 
simplify grading when direct observation is necessary. Indeed, the ease 
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of videotaping makes possible a wide variety of assessments in which 
one examiner interviews a student in a manner designed to probe 
understanding and thinking abilities, and a different set of graders 
scores tin studeni's performance. We are experimenting with this form 
of assessment in a primary grades mathematics project (Resnick, Bill & 
Lesgold, 1989). In one kind of assessment interview, a child is asked to 
solve an arithmetic problem and to explain his or her solution. In some 
of our interviews, the child is then shown an alternative solution and 
asked whether it too could be correct, and, if so, how two different 
solutions can yield the same answer. Performances of this kind can be 
graded on multiple criteria, such as the sophistication of the procedure, 
the completeness of the explanation, whether the child explains the 
solution conceptually or only procedurally, and whether the child pro- 
duces the explanation spontaneously or needs to be questioned or 
prompted by the examiner. These multiple ratings could easily be re- 
duced to reliable single scores in order to use these interview results for 
public accountability purposes. 

If widely adopted as part of the public accountability assessment 
system in education, performance assessments (including portfolio 
methods) could not only remove current pressures for teaching isolated 
collections of facts and skills, but could also provide a positive stimulus 
for introducing more extended thinking and reasoning activities in the 
curriculum. The adoption of performance assessment methods would 
require educators to describe the kinds of thinking performances de- 
sired and the criteria of excellent performance much more precisely 

Once introduced, performance assessments would also assure a con- 
tinuing forum for refining objectives and criteria for the thinking cur- 
riculum. There is good evidence for this in the experience of states that 
have used writing assessments for a few years. In those states, educators 
are beginning to discuss whether the kinds of essays students are asked 
to write reflect adequately ih * educational goals for writing. What is 
most striking is that the debates are primarily about curriculum and 
learning goals, not about techniques of assessment. 

Using performance assessments as part of public accountability pro 
grams would require panels of trained judges to evaluate students' 
performance using specific criteria, ensuring sufficient agreement 
among the judges and reliable, unbiased scores from a set of individu- 
als. Strateg for such training have been developed by various groups 
experienced in open-ended performance assessments in education. 
Teacher*- who have served on judging panels often report that the 



training and review sessions help them develop and refine criteria for 
their own classroom work. Recognizing this, some school districts in- 
volved in performance assessment programs are discussing possibilities 
for using the training sessions as part of their staff development pro- 
grams. Although it is too early for definitive evidence, experience to 
date suggests that teacher participation in judging and grading per- 
formance assessments can serve an important role in the general up- 
grading of educational standards. 

One frequently raised objection to performance assessments is their 
high cost relative to the machine scorable tests now used. Performance 
assessments are more costly than current precoded tests, because mul- 
tiple judges are needed every time an assessment is given. In public 
accountability assessment, in which an educational system, not individ- 
ual students, is being evaluated, the costs of a full assessment program 
can be kept within tolerable bounds by testing less frequently and 
sampling more lightly than is currently done in many mandated testing 
programs. 

Various schemes for light sampling have been developed, including 
methods that examine only some students, and those that examine all 
students in a given grade in which individuals take only a portion of the 
examination. Determining what justifiable inferences about student 
competencies can be made from different sampling procedures requires 
considerable technical, statistical sophistication. These issues are receiv- 
ing continuing attention by certain states, by NAEP, and by commis- 
sions and study panels devoted to questions of assessment practice. 

Thus, a scientifically sound basis exists for controlling costs by reduc- 
ing the amount of testing, rather than by insisting on cheap-to-admini- 
ster, precoded forms. To benefit from light sampling methods, states and 
other educational authorities will have to resist the temptation to com- 
bine accountability assessment with other testing functions requiring 
data on individual students, such as instructional diagnosis or student 
selection and certification. Accountability assessments should not at- 
tempt to offer diagnostic or other instructional management informa- 
tion. Such efforts will drive up the costs of open-ended performance 
assessments, creating pressures to return to multiple-choice tests. In any 
case, large ^eale assessments cannot be expected to provide the quick 
turnaround that teachers require in order for test-based information to 
be useful in instructional decisions. 

Although attempts to combine several functions in a single testing 
program are not advisable, performance assessments can also bo uied 
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for other kinds of educational functions, such as instructional diagnosis 
and selection. Selection testing, although it requires examination of 
individual students, needs to be done only once or twice in a student's 
educational career, keeping performance assessment costs within bounds. 
For college selection, the additional costs might even be included in the 
standard testing fee. In the case of diagnostic testing, it is perfectly 
acceptable — even desirable, in many cases — for students' own in- 
structors to grade and interpret students' performances. Although this 
may take more of instructors' time than scoring multiple-choice tests 
with a machine, the time spent is directly relevant to the instructional 
process and should help to focus instructional efforts on the quality of 
students' thinking and reasoning. 

Performance assessments are a feasible and attractive solution to the 
problems laid out in this essay. Properly developed and implemented, 
they allow for reliable measurement of thinking and reasoning in school 
subject matters. They offer a way to release educators from the pressure 
toward fractionated, low-level forms of learning rewarded by most 
current tests, and they can also set positive standards for an educational 
system that aims to cultivate thinking. Tied to curriculum and designed 
to 1 e taught to, performance assessments can be essential tools for 
raising authentic educational achievement. 
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